计算机与现代化 ›› 2012, Vol. 198 ›› Issue (2): 38-39.doi: 10.3969/j.issn.1006-2475.2012.02.011

• 算法设计与分析 • 上一篇    下一篇

一种基于网页源文件的信息提取算法

赵晓峰, 凌天斌, 彭波, 王转妮   

  1. 解放军外国语学院教育技术中心,河南 洛阳 471003
  • 收稿日期:2011-08-29 修回日期:1900-01-01 出版日期:2012-02-24 发布日期:2012-02-24

An Algorithm of Drawing Website Information Based on Webpage File Code

ZHAO Xiao-feng, LING Tian-bin, PENG Bo, WANG Zhuan-ni   

  1. Education Technology Center, Foreign Languages College of Chinese People’s Liberation Army, Luoyang 471003, China
  • Received:2011-08-29 Revised:1900-01-01 Online:2012-02-24 Published:2012-02-24

摘要: 通过对网页源文件的代码进行分析、设计信息提取的算法,目的是替代人工进行网站相关信息的获取,避免重复性劳动。首先对现有的两种Web结构进行比较分析,然后针对每一种Web结构提出信息提取的方案,接下来以日本著名新闻网站NHK为例,对上述方案进行验证和代码实现,最后对系统的功能扩充进行更高层次的展望。

关键词: Web结构, 信息提取, 网页标记

Abstract: This paper designs an algorithm of drawing information through the analysis of webpage file code. The purpose of this paper is to obtain the website information automatically. First, it analyzes and ampares two kinds of website structure, then proposes the algorithm of drawing information on the two website structure, following this, realizes the algorithm with code taking the NHK website as an example, at the end, expects the information drawing system’s future on function expansion.

Key words: Web struction, information drawing, webpage mark

中图分类号: